AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Speech Processing

# Multimodal Speech Processing

Ultravox V0 4 Llama 3 1 70b
MIT
Ultravox is a multimodal speech large language model, built upon the pre-trained Llama3.1-70B-Instruct and Whisper-medium backbones, capable of simultaneously receiving both speech and text as input.
Audio-to-Text Transformers Supports Multiple Languages
U
fixie-ai
79
4
Llama 3 Typhoon V1.5 8b Audio Preview
Typhoon-Audio Preview is a Thai and English audio-language model capable of processing text and audio inputs, with text outputs.
Audio-to-Text Transformers
L
scb10x
218
12
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase